Disambiguation of morphological analysis in Bantu languages

نویسنده

  • Arvi Hurskainen
چکیده

The paper describes problems in disambiguating the morphological analysis of Bantu languages by using Swahili as a test language. The main factors of ambiguity in this language group can be traced to the noun class structure on one hand and to the bi-directional word-formation on the other. In analyzing word-forms, the system applied utilizes SWATWOL, a morphological parsing program based on two-level formalism. Disambiguation is carried out with the latest version (April 1996) of the Constraint Grammar Parser (GGP). Statistics on ambiguity are provided. Solutions tbr resolving different types of ambiguity are presented and they are demonstrated by examples fi'om corpus text. Finally, statistics on the performance of the disambiguator are presented. I n t r o d u c t i o n There are five principal factors in Bantu languages which contribute to ambiguous analysis of wordtbrms. First, nouns are grouped into more than ten marked noun classes. The marking of these classes extends across the noun phrase, whereby the noun governs the choice of markers in dependent constituents. Second, verbs inflect steminitially and mark the subject, object, and relative referent by prefixes, whereby the actual form of each prefix is governed by the noun class of the noun it refers to. In addition, verb derivation also adds to the complexity of verbal morphology. Third, reduplication is a productive phenomenon. Because its accurate description in lexicon is not possible, alternative ways in handling it are discussed. Fourth, the majori ty of Bantu languages have a tone system, but rarely this is indicated in writing. This adds to morphological ambiguity. Fifth, various semantic functions of word-forms are also a source of ambiguity. In this paper I shall discuss the points one and two by using Swahili as a test language. 1 M o r p h o l o g i c a l analysis The morphological analysis of Swahili is carried out by SWATWOL, which is based on the twolevel formalism (Koskenniemi 1983). The application of this formalism to Swahili has been under process since 1987, and it has now, after having been tested with a corpus of one million words, reached a mature phase with a recall of 99.8% in average running text, and precision of close to 100%. The performance of SWATWOL corresponds to what is reported of ENGTWOL, the morphological parser of English (Voutilainen et al 1992; Tapanainen and J/irvinen 1994), and SWETWOL, the morphological analyzer of Swedish (Karlsson 1992). SWATWOL uses a two-level rule system for describing morphophonological variation, as well as a lexicon with 288 sub-lexicons. Unlike in languages with right-branching word formation, where word roots can be grouped together into a root lexicon, here word roots have been divided into several sub-lexicons. Because SWATWOL has been described in detail elsewhere (Hurskainen 1992), only a sketchy description of its parts is given here. 1.1 S W A T W O L rules Two-level rules have been written mainly for handling morhophonological processes, which occur principally in morpheme boundaries. Part of such processes take place also in verbal extensions, whereby the quality of the stem vowel(s) defines the surface form of the suffix. The total number of rules is 18, part of them being combined rules. An example of a combined rule:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Consequences of the Contacts between Bantu and Non-Bantu Languages around Lake Eyasi in Northern Tanzania

In rural Tanzania, recent major influences happen between Kiswahili and English to ethnic languages rather than ethnic languages, which had been in contact for so long, influencing each other. In this work, I report the results of investigation of lexical changes in indigenous languages that aimed at examining how ethnic communities and their languages, namely Cushitic Iraqw, Nilotic Datooga, N...

متن کامل

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

Fsm2 and the Morphological Analysis of Bantu Nouns – First Experiences from Runyakitara

This paper describes the implementation of Finite State methods, fsm2 in particular, in automatic analysis of Bantu nouns in one of the under resourced languages, Runyakitara. This is the first effort towards computational analysis of Runyakitara. A detailed description of Runyakitara noun classes and how they were analysed using fsm2 is given. In the current state of developing the system, 80%...

متن کامل

Morphological analysis for less-resourced languages: Maximum Affix Overlap applied to Zulu

The paper describes a collaboration approach in progress for morphological analysis of less-resourced languages. The approach is based on firstly, a language-independent machine learning algorithm, Maximum Affix Overlap, that generates candidates for morphological decompositions from an initial set of language-specific training data; and secondly, language-dependent post-processing using langua...

متن کامل

An Analysis of Metaphoric Use of Names of Body Parts in the Bantu Language Kifipa

This paper focused on the way names of body parts are artistically used to convey meanings and messages in Kifipa, a Bantu language spoken in Tanzania. Since the body parts metaphors are used by people to portray meanings in their daily conversations (Kovecses, 2004; Vierke, 2012), the paper investigated such linguistic richness in the language. Methodologically, the study identified names of b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996